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DETAILED ACTION 

1 . This communication is in response to the Arguments filed on 02/05/2008. Claims 
1-33 remain pending and have been examined. The Applicants' amendment and 
remarks have been carefully considered, but they are not persuasive and do not place 
the claims in condition for allowance. Accordingly, this action has been made FINAL. 

2. All previous objections and rejections directed to the Applicant's disclosure and 
claims not discussed in this Office Action have been withdrawn by the Examiner. 
Further, the IDS submitted on 02/05/2008 has been considered by the Examiner. 



Response to Arguments 

3. Applicant's arguments filed on 02/05/2008 (pages 2-8) have been fully 
considered but they are not persuasive. 

As to claim 1 , the Applicant argues first argue that the combination of Kushner in 
view of Durlach do not teach the limitation of a whitening unit for combining white noise 
with the frames input from the preprocessing unit. The Examiner traverses this 
argument. In response to applicant's arguments against the references individually, one 
cannot show nonobviousness by attacking references individually where the rejections 
are based on combinations of references. See In re Keller, 642 F.2d 413, 208 
USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 
1986). Further, Kushner was cited to teach the preprocessing unit for dividing a signal 
into frames (See col. 4, lines 6-7). Further, the Durlach reference was combined to 
teach the limitation of adding white noise to a signal (see Durlach col. 5, lines 56-65 
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and Figure 2, target signals (speech) 50a, 50b, and 50n are added with the noise 
generator 60 by mixer 56). Durlach teaches adding noise to a signal. The signal can be 
segmented when noise is added as processing occurs once the signal is received (see 
Durlach Figure 2). Hence, the stated limitations are taught by the combination of 
Kushner in view of Durlach. 

In response to the second argument, regarding the use of white noise as being 
well known in the art is an improper Official Notice. The Examiner traverses this 
argument by showing that the use of white noise insertion is common in the art. 
Specifically, this is seen in patent 5,768,474 by Neti in col. 6, lines 4-29. Further, 
motivation is also given for adding noise in order to produce a realistic noise 
environment (see col. 6, lines 24-34) where speech distortion may occur. Hence, the 
adding of white noise is common when simulating noisy environments. 

In response to the third argument presented by the Applicant, the Applicant 
argues that one skilled in the art would have no motivation to combine Kushner and 
Durlach since Durlach uses multiple microphones and Kushner only one. The Examiner 
traverses this argument by reciting col. 5, lines 54-col. 6, lines 16. In the cited sections, 
directional information with content information is input into a microphone. It should be 
noted that S1(t) to Sn(t) time varying signal exists. If n=0 then we have the single signal 
case. Hence, in this case one microphone is needed, which in this example is 50a. As 
stated by the Applicant, the Kushner reference includes a single microphone and hence 
the combination with Durlach would not change the principle operation as the applicant 
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suggests since when n=0 the single microphone case occurs. Hence, such a 
combination is sufficient. 

In response to the fourth argument presented by the Applicant, the Applicant 
argues that Eryilmaz does not teach the extraction of random parameters indicating 
frame randomness. The Examiner traverses this argument. Eryilmaz does teach the 
random parameter extraction unit (see col. 2, lines 30-42, speech energy value 
determined) for extracting random parameters indicating the randomness of frames 
(see col. 2, lines 30-55, voice activity is detected based on the comparison to a 
threshold. The determination of energy is random since it is not known whether the 
frame is voice or noise. Further, the randomness is addressed by indicating the noise or 
voice present in the signal). The indication of voice activity is in itself random since it is 
unknown when passing making a determination of speech or noise. The random 
parameters are extracted and are the energy value determined for the frame (see col. 4, 
lines 5-18, especially equation 1). Hence, for a series of frames or frame a different 
energy value may exist which indicates whether speech is present or not. It is 
determined that speech exists when above thresholds and non-speech when not within 
the threshold (see col. 2, lines 38-54). Hence, the extraction of random parameters 
indicating randomness of frames is taught by the cited reference. 

In response to applicant's argument that the examiner's conclusion of 
obviousness is based upon improper hindsight reasoning, it must be recognized that 
any judgment on obviousness is in a sense necessarily a reconstruction based upon 
hindsight reasoning. But so long as it takes into account only knowledge which was 
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within the level of ordinary skill at the time the claimed invention was made, and does 
not include knowledge gleaned only from the applicant's disclosure, such a 
reconstruction is proper. See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 
1971). 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claims 1 , 2, 1 8, 1 9 is rejected under 35 U.S.C. 1 03(a) as being unpatentable 
over Kushner et al. (US 6,862,567) in view of Durlach et al. (US 5,828,997) in view 
Eryilmaz (US 5,867,574). 

As to claims 1 and 18, Kushner etal. teaches a voice region detection apparatus, 
comprising: 

a preprocessing unit for dividing an input voice signal into frames (see 
col.4, lines 6-7, segments acquisition window into frames.); 

a frame state determination unit for classifying the frames into voice 
frames and noise frames (see col. 4, lines 29-37, speech/noise classifier done by 
microprocessor 110) based on the random parameters extracted by the random 
parameter extraction unit; and 
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a voice region detection unit (see col. 5, lines 9-14, microprocessor 110) 
determines the starting point and ending point of the speech utterance.) for 
detecting a voice region by calculating start (see col. 6, lines 8-9 and Figure 2, 
microprocessor 110 determines the starting point and ending point of the speech 
utterance) and end positions of a voice based (see col. 6, lines 65-66 and Figure 
2, endpoint is determined) on the voice and noise frames input from the frame 
state determination unit (e.g. From the determination of a speech utterance the 
voice regions are detected based on energy.). 

However, Kushner et al. does not specifically teach the whitening unit for 
combining white noise to the input frames. 

Durlach et al. does teach the whitening unit combining white noise to the 
input frames (see col. 5, lines 56-65 and Figure 2, target signals (speech) 50a, 
50b, and 50n are added with the noise generator 60 by mixer 56). Although white 
noise is not used when adding to the target signals, it would have been obvious 
to add white noise to a signal or any other type of noise depending on 
environment simulated. 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the voice recognition as taught by 
Kushner et al. with the addition of a whitening unit as taught by Durlach et al. The 
motivation to have combined the references involve the ability to incorporate the 
directionality of a signal for sound localization (see Durlach et al., col. 5, lines 60- 
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65) as would benefit the preprocessed signal from Kushner et al. for real-time 
environmental simulation. 

However, Kushner et al. in view of Durlach et al. do not specifically teach 
the random parameter extraction unit. 

Eryilmaz does teach the random parameter extraction unit (see col. 2, 
lines 30-42, speech energy value determined) for extracting random parameters 
indicating the randomness of frames (see col. 2, lines 30-55, voice activity is 
detected based on the comparison to a threshold. The determination of energy is 
random since it is not known whether the frame is voice or noise. Further, the 
randomness is addressed by indicating the noise or voice present in the signal). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the voice recognition as taught by 
Kushner et al. in view of Durlach et al. with the addition of random parameter 
extraction unit as taught by Erlyimaz for the purpose of classifying the voice 
frames and noise frames as taught by Kushner et al. in view of Durlach et al. by a 
method for classification. 



As to claim 2 and 1 9, Kushner et al. in view of Durlach et al. in view of Eryilmaz 
teach all of the limitations as in claims 1 and 18 above. 

Furthermore, Kushner et al. teaches wherein the preprocessing unit 
samples the input voice signal according to a predetermined frequency (see col. 
3, lines 66-col. 4, lines 10, digitize) and divides the sampled voice signal into a 
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plurality of frames (see col. 4, lines 4-10, segmentation into frames is performed.) 
(e.g. The digitization of the voice signal makes the use of sampling frequency 
obvious as the signal is sent to the microprocessor for further processing. It is 
obvious that this sampling frequency is utilizing the Nyquist criterion.) 

As to claim 4 and 21 , Kushner et al. in view of Durlach et al. in view of Eryilmaz 
teach all of the limitations as in claims 1 and 18 above. 

Furthermore, Durlach et al. teaches wherein the whitening unit comprises 
a white noise generation unit (see Figure 2, noise generator 60) for generating 
the white noise, and a signal synthesizing unit (see Figure 2, mixer 56) for 
combining the frames input from the preprocessing unit (see signals 50a, 50b, 
and 50n) with the white noise generated by the white noise generation unit (e.g. 
Noise is added to the target signal.). 

6. Claims 3 and 20 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kushner et al. in view of Durlach et al. in view of Eryilmaz as applied to claims 2 and 1 9 
above, and further in view of Mekuria (US 6,182,035). 

As to claims 3 and 20, Kushner et al. in view of Durlach et al. in view of Eryilmaz 
teach all of the limitations as in claims 2 and 19 above. 

However, Kushner et al. in view of Durlach et al. in view of Eryilmaz do not 
specifically teach the frames overlapping with one another. 

Mekuria does teach the overlapping of frames (see col. 8, lines 28-29). 
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It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have combined voice recognition as taught by 
Kushner et al. in view of Durlach et al. in view of Eryilmaz with the overlapping of 
frames as taught by Mekuria. The motivation to have combined the references 
involves the use of samples in more than one frame (see Mekuria col. 8, lines 28- 
29). 

7. Claims 5 and 22 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kushner et al. in view of Durlach et al. in view of Eryilmaz as applied to claims 1 and 1 8 
above, and further in view of Davis et a/.(US 2003/0216909). 

As to claim 5, Kushner et al. in view of Durlach et al. in view of Eryilmaz teach all 
of the limitations as in claim 1 and 18 above. 

Furthermore, Durlach et al. does teach the whitening unit combining white 
noise to the input frames (see col. 5, lines 56-65 and Figure 2, target signals 
(speech) 50a, 50b, and 50n are added with the noise generator 60 by mixer 56). 

Furthermore, Eryilmaz does teach the random parameter extraction unit 
(see col. 4, lines 30-42, speech energy value determined). 

However, Kushner et al. in view of Durlach et al. in view of Eryilmaz do not 
specifically teach wherein the calculation of the numbers of runs (see [0046], 
consecutive frames that fulfill the energy threshold. 

Davis does teach wherein the calculation of the numbers of runs (see 
[0046], consecutive frames that fulfill the energy threshold. ( If x number of frames 
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meet the requirement than it is determined that speech is present, otherwise 
noise or non-speech is present.) consisting of consecutive identical elements in 
the frames. 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have combined voice recognition as taught by 
Kushner et al. in view of Durlach et al. in view of Eryilmaz with consecutive runs 
of the random parameter as taught by Davis et al. The motivation to have 
combined the references involves the ability for the VAD processor to produce 
the chance for short-term events triggering a VAD state change (see Davis et al. 
[0046]). 

8. Claims 7-9 and 24-26 are rejected under 35 U.S.C. 1 03(a) as being unpatentable 
over Kushner et al. in view of Durlach et al. in view of Eryilmaz as applied to claims 1 
and 18 above, and further in view of Pastor (US 5,572,623). 

As to claims 7 and 24, Kushner et al. in view of Durlach et al. in view of Eryilmaz 
teach all of the limitations as in claims 1 and 18, above. 

However, Kushner et al. in view of Durlach et al. in view of Eryilmaz do not 
specifically teach the voice frames including vocal frames and fricative frames. 

Pastor does teach the frames including vocal and fricative frames (see col. 
4, lines 66-67 and col. 5, lines 5-14). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have combined the voice recognition as taught by 
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Kushner et al. in view of Durlach et al. in view of Eryilmaz with the inclusion of 
fricative frames as taught by Pastor. The motivation to have combined the 
references involves the inclusion of fricatives that are present in at the start and 
end of speech (see Pastor col. 1, lines 29-33). 

As to claims 8, 9, 25, and 26, Kushner etal. in view of Durlach et al. in 
view of Eryilmaz in view of Pastor teach all of the limitations as in claims 7 and 
24, above. 

Furthermore, Eryilmaz teaches wherein the frame state determination unit 
(e.g. voice activity detector) determines if the random parameter of a frame 
extracted by is below a first threshold (see col. 2, lines 30-55, voice activity is 
detected based on the comparison to a threshold.) The determination of energy 
is random since it is not known whether the frame is voice or noise.) then it is a 
vocal frame (e.g. If the noise is below the value of the threshold, then speech is 
present or vocal frame. The use of a specific threshold would have been obvious 
to one skilled in the art in order to distinguish voice from noise. Hence, the use of 
below or above a threshold is matter of design choice and relativity. The 
Applicants do not indicate reasons for selecting the stated thresholds (see 
Applicant's Specification, page 11, lines 17-21). 

9. Claims 10, 11, 27, and 28 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Kushner et al. in view of Durlach etal. in view of Eryilmaz in view of 
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Pastor as applied to claims 8 and 25 above, and further in view of Chong-White et al. 
(US 7,065,485). 

As to claims 10, 1 1 , 27, and 28, Kushner et al. in view of Durlach et al. in 
view of Eryilmaz in view of Pastor teach all of the limitations as in claims 8 and 
25, above. 

However, Kushner et al. in view of Durlach et al. in view of Eryilmaz in 
view of Pastor do not specifically teach if the random parameter of a frame 
extracted by the random parameter extraction unit is above a second threshold, 
the relevant frame is a fricative frame. 

Chong-White etal. does teach if the random parameter of a frame 
extracted by the random parameter extraction unit (see col. 7, lines 22-25, 
energy ratio computed, similar to Erlylimaz) is above a second threshold (see col. 
7, lines 46-47, fricatives identified when above a threshold), the relevant frame is 
a fricative frame. As to claims 1 1 and 28, it would have been obvious to select a 
threshold value for comparing different types of values of a signal with respect to 
a ration. 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have combined the voice recognition as taught by 
Kushner et al. in view of Durlach et al. in view of Eryilmaz in view of Pastor with 
the inclusion of a threshold indicating a fricative as taught by Chong-White etal. 
The motivation to have combined the references involves the ability to detect 
further unvoiced components in a signal consisting of speech and non-speech. 
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Furthermore, the use of the voice recognition as taught by Kushner etal. in view 
of Durlach et al. in view of Eryilmaz in view of Pastor allows the ability to detect 
noise, voice and fricatives contained in the signal. 

As to claims 1 2, 1 3, 29, and 30, Kushner et al. in view of Durlach et al. in 
view of Eryilmaz in view of Pastor in view of Chong-White teach all of the 
limitations as in claims 8 and 25, above. 

Furthermore, Eryilmaz teaches wherein the frame state determination unit 
determines that if the random parameter of the frame extracted by the random 
parameter extraction unit is below the second threshold, the relevant frame is a 
noise frame (see col. 2, lines 30-55, voice activity is detected based on the 
comparison to a threshold. The determination of energy is random since it is not 
known whether the frame is voice or noise). 

However, Eryilmaz does not specifically teach the use of two thresholds 
for comparison. 

It would have been obvious to use multiple thresholds for classifying each 
frame so that the detection of voice and fricative frames can be detected as 
taught by Chong-White above. Further, the values for the thresholds used are a 
matter of design choice based on the thresholds computed. 

10. Claims 14 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kushner et al. in view of Durlach et al. in view of Eryilmaz as applied to claim 1 above, 
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and further in view of Rezayee et al. ("An Adaptive KLT Approach for Speech 
Enhancement"). 

As to claim 14, Kushner et al. in view of Durlach et al. in view of Eryilmaz teach 
all of the limitations as in claim 1, above. 

However, Kushner et al. in view of Durlach et al. in view of Eryilmaz do 
not specifically teach a color noise elimination unit for eliminating color noise 
from voice. 

Rezayee et al. teaches the enhancement of speech from colored noise 
(see Abstract). 

It would have been obvious to one of ordinary skilled in the art to have 
combined the voice recognition as taught by Kushner et al. in view of Durlach et 
al. in view of Eryilmaz with the inclusion of a color noise eliminator as taught by 
Rezayee et al. the motivation to have combined the references is since colored 
noise consist of various noise variances and is not the same as white noise, 
which has same variance (see Rezayee et al. page 87, right column, 3 rd 
paragraph, lines 12-17). 

1 1 . Claims 1 5 and 31 are rejected under 35 U.S.C. 1 03(a) as being unpatentable 
over Kushner et al. in view of Durlach et al. in view of Eryilmaz in view of Pastor in view 
of Chong-White et al. (US 7,065,485) as applied to claims 10 and 27, above and further 
in view of Rezayee et al. ("An Adaptive KLT Approach for Speech Enhancement"). 
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As to claims 15 and 31 , Kushner etal. in view of Durlach et al. in view of 
Eryilmaz in view of Pastor in view of Chong-White et al. teach all of the limitations as in 
claim 1, above. 

However, Kushner et al. in view of Durlach et al. in view of Eryilmaz in 
view of Pastor in view of Chong-White et al. do not specifically teach a color 
noise elimination unit for eliminating color noise from voice. 

Rezayee et al. teaches the enhancement of speech from colored noise 
(see Abstract). 

It would have been obvious to one of ordinary skilled in the art to have 
combined the voice recognition as taught by Kushner et al. in view of Durlach et 
al. in view of Eryilmaz in view of Chong-White with the inclusion of a color noise 
eliminator as taught by Rezayee et al. the motivation to have combined the 
references is since colored noise consist of various noise variances and is not 
the same as white noise, which has same variance (see Rezayee et al. page 87, 
right column, 3 rd paragraph, lines 12-17). Furthermore, it should be noted that the 
following elimination of colored noise is being done when speech is present. 
Hence, the detection of a vocal frame will entail speech is present and further 
enhance the signal from colored noise. 
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Allowable Subject Matter 

12. Claims 6, 16, 17, 23, 32, and 33 are objected to as being dependent upon a 
rejected base claim, but would be allowable if rewritten in independent form including all 
of the limitations of the base claim and any intervening claims. 

1 3. The following is a statement of reasons for the indication of allowable subject 
matter: None of the prior art alone or in combination teaches the following limitations: 
NR=R/n, as recited tin claims 6 and 23; "color noise ... obtained... amount of reduction 
in the random parameter... due to color noise" as recited in claims 16, 17, 32, and 33. 



Conclusion 

14. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 
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Neti (US 5,768,474) is cited to disclose a noise robust speech processing with 
cochlea filters in an auditory model. 

1 5. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Paras Shah whose telephone number is (571)270-1650. 
The examiner can normally be reached on MON.-THURS. 7:00a. m.-4:00p.m. EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on (571)272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Paras Shah/ 
Examiner, Art Unit 2626 



03/20/2008 
/Patrick N. Edouard/ 

Supervisory Patent Examiner, Art Unit 2626 



